Constrained Reinforcement Learning in Hard Exploration Problems

نویسندگان

چکیده

One approach to guaranteeing safety in Reinforcement Learning is through cost constraints that are dependent on the policy. Recent works constrained RL have developed methods ensure enforced even at learning time while maximizing overall value of Unfortunately, as demonstrated our experimental results, such approaches do not perform well complex multi-level tasks, with longer episode lengths or sparse rewards. To end, we propose a scalable hierarchical for problems employs backward functions context task hierarchy and novel intrinsic reward function lower levels enable constraint enforcement. key contributions proving theoretically viable when there multiple decision making. We also show new approach, referred Hierarchically Limited consTraint Enforcement (HiLiTE) significantly improves state art Constrained many benchmark from literature. further demonstrate this performance (on enforcement) clearly outperforms existing best RL.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resource Constrained Exploration in Reinforcement Learning

This paper examines temporal difference reinforcement learning (RL) with adaptive and directed exploration for resource-limited missions. The scenario considered is for an energy-limited agent which must explore an unknown region to find new energy sources. The presented algorithm uses a Gaussian Process (GP) regression model to estimate the value function in an RL framework. However, to avoid ...

متن کامل

Learning to soar: Resource-constrained exploration in reinforcement learning

This paper examines temporal difference reinforcement learning with adaptive and directed exploration for resourcelimited missions. The scenario considered is that of an unpowered aerial glider learning to perform energy-gaining flight trajectories in a thermal updraft. The presented algorithm, eGP-SARSA(l), uses a Gaussian process regression model to estimate the value function in a reinforcem...

متن کامل

Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning

We introduce SCAL, an algorithm designed to perform efficient exploration-exploitation in any unknown weakly-communicating Markov Decision Process (MDP) for which an upper bound c on the span of the optimal bias function is known. For an MDP with S states, A actions and Γ ≤ S possible next states, we prove a regret bound of Õ(c √ ΓSAT ), which significantly improves over existing algorithms (e....

متن کامل

Efficient Exploration in Reinforcement Learning

An agent acting in a world makes observations, takes actions, and receives rewards for the actions taken. Given a history of such interactions, the agent must make the next choice of action so as to maximize the long term sum of rewards. To do this well, an agent may take suboptimal actions which allow it to gather the information necessary to later take optimal or near-optimal actions with res...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i12.26757